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Abstract — The repair locality of a distributed storage code is 
the maximum number of nodes that ever needs to be contacted 
during the repair of a failed node. Having small repair locality is 
desirable, since it is proportional to the number of disk accesses 
during repair. However, recent publications show that small 
repair locality comes with a penalty in terms of code distance or 
storage overhead if exact repair is required. 

Here, we first review some of the main results on storage 
codes under various repair regimes and discuss the recent work 
on possible (information-theoretical) trade-offs between repair 
locality and other code parameters like storage overhead and 
code distance, under the exact repair regime. 

Then we present some new information theoretical lower 
bounds on the storage overhead as a function of the repair 
locality, valid for all common coding and repair models. In 
particular, we show that if each of the n nodes in a distributed 
storage system has storage capacity a and if, at any time, a 
failed node can be functionally repaired by contacting some set 
of r nodes (which may depend on the actual state of the system) 
and downloading an amount 13 of data from each, then in the 
extreme cases where a = /3 or a = r/3, the maximal coding rate 
is at most r/{r + l) or 1/2, respectively (that is, the excess storage 
overhead is at least 1/r or 1, respectively). 

I. Introduction 

A study sponsored by the storage company EMC found that 
the world's data is doubling every two year, and estimated it 
at 1.8 zettabytes (1.8 trillion gigabytes) in 2011.[l]Given these 
enormous volumes, the importance of efficient data storage can 
hardly be overestimated. These huge amounts of data need 
to be stored and reliably maintained over time, while being 
stored on individually unreliable components. To guarantee 
data survival over time, redundancy must be introduced. In 
distributed storage systems (DSS), typically data objects are 
stored in encoded form onto multiple storage units or storage 
nodes. In older DSS, data blocks were simply replicated, but 
the actual, enormous scale of operations demands the use 
of more sophisticated erasure coding techniques. Currently, 
Reed-Solomon codes and other erasure codes are employed 
in cloud environments like Microsoft Windows Azure Storage 
IT], and in peer-to-peer storage systems like Wuala, Cleversafe, 
Oceanstore, and TotalRecall, see e.g., JJ], Q, and references 
therein. 

The use of erasure codes potentially affords orders of mag- 
nitude greater reliability while requiring less storage overhead, 
but to achieve this potential, it is of crucial importance to 
find efficient solutions for the repair problem, the problem 

'http://www.emc.com/about/news/press/201 1/201 10628-01. htm 



of maintaining system reliability in the presence of node 
failures. Over time, storage nodes will leave the system due to 
node failures, caused for example by hardware failures (i.e., 
disk failures) or software updates in data centers, or peer 
churning in peer-to-peer systems. Under the simplest and most 
straightforward repair regime called exact repair, each data 
block stored on a failed node has to be exactly reconstructed 
and stored on a newcomer node. In a more subtle repair regime 
called functional repair, we do not require that the newcomer 
stores an exact copy of the lost data block, but typically 
the data block stored in the newcomer node will be some 
linear combinations of the data blocks in the other nodes, not 
necessarily exactly equal to the lost data block but enabling 
recovery of the originally stored information in combination 
with the data blocks on the other nodes (later, we will discuss 
an example). 

Various performance metrics for repair efficiency have been 
considered. The total amount of information communicated 
during repair (called the repair bandwidth |j4J) has received 
the most attention, and is currently best understood. However, 
for certain applications like cloud storage and deep archival 
minimizing disk I/O seems more valuable 15j. Since the disk 
I/O is proportional to the number of nodes contacted during 
repair of a failed node, the repair locality of a storage code 
has recently emerged as an important parameter. 

In this paper, we first present a brief overview of the cutset 
bound and regenerating codes from |)4l, discussing various 
types of storage codes along the way. Then we review the 
recent work on repair locality and present some new results. 
We end by suggesting some directions for further research. 

For a general, more complete overview of DDS and storage 
codes, we refer to ||2], ||3], and to the Storage Wiki ||6]. 

II. Regenerating codes 

Assume that a data object is stored in encoded form across 
?? storage nodes of a DSS, with each of the nodes storing one 
data block, an amount a of data of the encoded object. When 
a node fails, a newcomer node is allowed to contact any set 
of r live nodes and to download an amount /3 of data from 
each of them in order to regenerate some of the lost data, in 
the form of a replacement block, again containing an amount 
a of data. This number r is referred to as the repair locality 
or the fan-in of the repair process. (Note that in many earlier 
publications the letter d is used instead.) We require, and this 
is essential, that this regeneration process ensures that a data 



collector can reconstruct the original data object, at any time 
during this process, from any k of the resulting data blocks in 
the current n live nodes, for some number k. In what follows, 
we assume that k is the smallest number with this property; 
note that then k <r <n ~ 1. Now the question that arises is: 
how much information can be stored given these assumptions? 

The repair problem can be abstracted in terms of an in- 
formation flow network, where a new node v, having storage 
capacity a, is represented by a capacitated edge v™ 
and with r capacitated edges 11;°"* w™, each of of capacity 
/?, representing the data flow from the nodes assisting in the 
repair towards the new node during regeneration. Now the 
problem is reduced to a multicasting problem on this network. 
Network flow theory can be used to investigate the maximum 
possible flow of information towards a set of k nodes used 
for data recovery by looking for possible bottlenecks in the 
network. In the breakthrough paper f?!, it is shown by such a 
a maxflow-mincut argument that the maximum amount m of 
information that can be stored satisfies the cutset bound 
fe-i 

m < min{(r — j)/3, a). (1) 
3=0 

Storage codes for this model that meet the above bound 
are called Regenerating Codes. Two types of Regenerating 
Codes are of special interest, one corresponding to the point 
of optimal storage efficiency and the other to the point of 
optimal repair bandwidth efficiency. Since any k nodes contain 
all available information, nodes must have storage capacity 
a > m/k. Regenerating Codes with a = m/k minimize 
the required amount of storage among regenerating {n,k,r) 
codes; such codes are called Minimum Storage Regenerating 
(MSR) codes. They are characterized by having a = m/k = 
{r — k + l)f3. On the other hand, a data amount of at most 
7 = r/3 is available during repair of a node, so that r/3 > a. 
Code with r/3 = a minimize the repair bandwidth j = r/3 
among regenerating (n, k, r) codes; such codes are called 
Minimum Bandwidth Regenerating (MBR) codes. They are 
characterized by having a = 7-/3 and m/ (3 = kr — (2). 

The existence of Regenerating Codes (and even of linear 
ones) for all feasible parameter sets (i.e., with n > r > k, 
assuming the value for k is minimal) essentially follows from 
results in Network Coding ID, but explicit constructions are 
not immediately available, nor are they obvious. Moreover, 
to make matters worse, strictly speaking the said results only 
guarantee existence of these codes for functional repair of 
a number of node failures that is bounded over time, if a 
sufficiently large field is employed. It would be hard to imagine 
that the boundedness condition is really essential, and indeed 
it has been lifted, first for r = n — 1 in Q, and later for all 
parameter sets on the cutset bound in ||8]. The resulting codes 
do prove existence, but are not practical. 

Fortunately, for the important cases of MSR and MBR 
codes, as well as for some other cases, explicit constructions 
are now known for functional repair, and in many cases also 
for exact repair Before we review these results, we discuss 



a useful abstract description of storage codes for exact and 
functional repair, and provide some examples. 

III. Linear storage codes 

Recall that, under the exact repair regime, each data block 
on a failed storage node has to be exactly reconstructed and 
stored on a newcomer node. Just as linear error-correcting or 
erasure-correcting codes are best thought of simply as vector 
spaces over a finite field, whose properties relative to notions 
like (Hamming) distance can then be studied, we believe that 
linear distributed storage codes for the exact-repair regime 
are best thought of in a similar way, now as a collection 
of subspaces of a fixed vector space, for which then similar 
appropriate notions can be introduced and investigated. So 
we will first introduce them in this way, along with various 
relevant notions. Then, we will explain how to use a storage 
code to actually store and maintain information, and discuss 
some examples to illustrate the concepts. Our approach should 
be compared to the one as found for example in ||9] or llTOl . 

A. Exact-repair storage codes as collections of vector spaces 

A linear exact-repair distributed storage code (LERSC) 
U, with parameters {m;n,a) over a (finite) field F is a 
collection of n subspaces Ui,...,Un of an m-dimensional 
vector space U over F, each of dimension a. We will refer to 
the space U as the message space and to the Ui as the storage 
spaces. The integer a is called the storage capacity. 

A subset K of the storage spaces is called a recovery set of 
the storage code U if these subspaces together span the entire 
vector space U. Here, the span of a collection of vector spaces 
j4i, . . . , Afi is the collection of all vectors ai + - ■ ■ ah with G 
Ai for all i, that is, the smallest vector space containing all the 
vector spaces Ai, . . . , Ah. The recovery dimension k = k{lA) 
of U is defined as the size of the smallest recovery set of U. 

Given some positive integer /3, referred to as the transport 
capacity, we say that a collection R of subspaces of the storage 
code is a repair set for a certain subspace Ui ^ R if it is 
possible to choose a /^-dimensional repair space Wi^i C Ui 
for each Ui E R such that Ui is contained in the span of the 
repair spaces Wi^g. If each subspace in the storage code has 
a repair set of size r w.rt. transport capacity p then we say 
that the code U has repair locality r with respect to transport 
capacity /3. We will refer to a storage code U with all the 
above parameters as an (m; n, fc, ?■, a, /3)-storage code. 

Now let us see how such a storage code U can be used to 
actually store and maintain information. The information to be 
stored will be represented by a vector x € U. So, for example, 
if F has size q = 2'\ then x represents a file consisting of mh 
bits, grouped into m symbols of h bits each. Now consider a 
DSS consisting of n storage units or storage nodes vi, . . . , w„. 
In each subspace Ui of U we choose a basis 6,^1, . . . ,bi^a, 
represented by the axm matrix Bi = [bi^i ■ ■ ■ bi^a]- Then, we 
associate the subspace Ui with storage node Vi, and use this 
node to store the a symbols of the vector Bj x, that is, in 
Vi we store the a inner products of x with the basis vectors 
of Ui. Using only simple linear algebra, it is easily seen that 



indeed a data collector can recover the vector x by collecting 
the set of vectors Bj x stored in a subset K of the nodes if 
(and only if) these nodes constitute a recovery set of U. (Here 
it is of course assumed that the choice of the matrices Bi is 
known to the data collector) 

Similarly, given a repair set R for a node vi w.r.t. transport 
capacity /3, we choose a fixed basis in each repair space Wi^^ 
inside subspace Ui £ R, represented by a /3 x a repair matrix 
Tij having this basis as columns. Again, it is easily seen that 
(a) each node Vi G R can compute T^fX from the vector Bjx 
stored in Vi and (b) node ve can recompute the vector Bjx 
from the vectors T^^x collected during repair from the nodes 
in the repair set R. (Here, we assume that the choice of the 
repair matrices Tij is known in node v^.) 

Note that a storage code as above has a coding rate R{U) = 
m/{na) and excess storage overhead oiU) = \/R{U) — 1 = 
(na — m)/m. 

Example 3.1: Consider the storage code U = 
{Uq, 1/1,1/2,113} over the binary field F2 with 
Uo = (eo, 62 + 63), Ui = (ei, 63 + eo), U2 = (e2, cq + ei), 
and 1/3 = (63, ei + 62), considered as subspaces of [/ = F|. 
Here, we write {ai,...,ah) to denote the span of the 
vectors ai,...,ah, the vector space consisting of all linear 
combinations of ai,...,ah- We claim that the code U is 
an (m = 4; n = 4, fc = 2,r = 3, a = 2,j3 = 1) linear 
exact-repair storage code (LERSC). Indeed, there are n = 4 
subspaces Ui, each of dimension a = 2. Furthermore, U 
has dimension m = 4, and k = 2 since any two subspaces 
intersect trivially, so together span U. To repair node vq 
using the size r = 3 repair set R = {Ui, U2, U3}, we choose 
repair spaces Wi,q = (gq + £3) C Ui, W2,q = (62) C U2, 
and W3.0 = (£3) C U3, each of dimension f3 ~ 1. Note that 
this choice is valid since indeed Uq C (60 + 63,62,63). The 
storage code U is invariant under the linear transformation 
given by ei H' 6i+i (indices modulo 3), so the repair spaces 
for other nodes can be obtained by symmetry. With the 
bases as suggested by the above description, this code stores 
a vector x = (xo,...,a;3) by letting node hold xq and 
X2 + X3, and repairs node by downloading xq + X3 from 
node 1, X2 from node 2, and 2:3 from node 3. □ 

A linear transformation fixing the storage code such as 
the cyclic shift in the example above could be termed an 
automorphism of the code. The notion of code automorphisms 
has been very fruitful in the field of error-correcting codes, 
where it has lead to the discovery of several important classes 
of codes such as cyclic codes, of which Reed-Solomon codes 
are a special case. But in contrast, until now symmetry has 
not played a significant role in storage codes. It might be of 
interest to systematically search for storage code with extra 
symmetries. 

A LERSC U is essentially determined by the subspaces 
contained in U, however, as seen above the actual implemen- 
tation of the code also depends on the choice of bases in the 
various spaces. This choice can have a crucial influence on 
the performance of the code. Ideally, each repair subspace is 
spanned by a subset of the basis in the node; in that case. 



during repair each node simply transfers a subset of its data, 
so that no computations are required. This situation, referred 
to as repair-by -transfer, is illustrated below. 

Example 3.2: We construct a simple binary rate-(l/2) 
repair-by-transfer (m = (J^y,n,k ^ n—l,r = n—l,a = n — 
I, /3 — I) storage code. (In fact, these codes are MBR codes.) 
The message space U has dimension m = (2), so we can 
index the coordinate positions with pairs {i,j} C {1, . . . ,n}. 
Given a message vector x = {xjij}}, we let node v store the 
q; = n — 1 symbols x^y j-^ {j ^ v). If node v fails, it can be 
exactly repaired by downloading symbol x^y j-j from node j, 
for each node j ^ v. In other words, the node subspaces are 
Uv = I j 7^ v), with repair spaces Wj^e = {e{j^i]). □ 

The Fractional Repetition Codes described in ifTTl combine 
a repair-by-transfer inner code with an MDS outer code (that 
is, the stored vector x in the message space is itself a codeword 
in an MDS code); these codes actually meet the cutset bound 
dill at the MBR point. 

B. Linear functional-repair storage codes 

Under the regime of functional repair, a data block on a 
failed storage node has to be replaced by a data block on a 
newcomer that is information equivalent to the one on the 
failed node, while ensuring the possibility of future functional 
repair of other nodes. Linear distributed storage codes for 
functional repair are perhaps best thought of as a specification 
of a subspace arrangement, with the property that in any 
realization, a subspace can be "repaired" by replacing it with a 
(possibly different) subspace so that the resulting arrangement 
again satisfies the specifications. An example will help to 
illustrate the idea. 

Example 3.3: We will construct a linear functional-repair 
storage code U with parameters {m ~ 5;n ~ 4,k = r ~ 
3, a = 2, /3 = 1), so with coding rate i? = 5/8. Note that this 
parameter set meets the cutset bound in a point different 
from the MBR and MSR points. 

Let J7 be a 5-dimensional vector space over F2. We will 
ensure that at each moment in time, the four 2-dimensional 
storage subspaces Ui, ... ,1/4 associated with the four storage 
nodes comply with the following specification: 

1) Any two of the storage spaces intersect trivially, that is, 
U^ n Uj = {0} when i ^ j; 

2) Any three of the storage spaces span U. 

Suppose that Ui, ... ,1/4 satisfy these constraints, and suppose 
that node 4 fails. Without loss of generality, we may assume 
that Ui = (61,63), 1/2 = (62,64), and C/3 = (65,61 +62), 
for some basis 61, . . . , 65 of U. Indeed, U3 must have trivial 
intersection with both Ui and U2, but, having dimension 2, 
necessarily intersects the 4-dimensional span Ui + U2, hence 
this intersection is of the form 61+62 with ei G U* = t/i\{0}. 
This shows that 1/1,1/2,1/3 have the indicated form. Now, to 
repair (or initially construct) the storage space U4, given that 
/3 = 1 we must choose a vector e U* for i = 1,2,3, and let 
U4 be some 2-dimensional subspace of their span (oi, 02, 03), 
which by rule 1 should not contain any of the a^. Hence U4 is 
of the form {0, ai + a2, ai+a3, 02 + 03}. Finally, 03 ^ 61 + 62 



since otherwise U4 C U1 + U2, violating rule 2, and similarly, 
fli ^ ei, a2 ^ 62. So ai 63 + a-iei, 02 = 64 + ^262, 03 
65 + 2:3(61 + 62), and it is now easily verified that any choice 
of xi,X2,X3 G F2 is valid. (Initially, we can take for example 
U4 = (63 + 64, 63 + 65).) This shows that we can maintain the 
specification forever, provided that never two nodes fail at the 
same time. □ 

The use of functional-repair storage codes as above to actually 
store information is similar to that of the exact-repair storage 
codes introduced earlier, except that now at each moment the 
other nodes and the data collector have to be informed of the 
actual state of a storage node, that is, of its current storage 
space. This extra overhead can be relatively small if the code 
is used to store a large number of messages simultaneously. 

C. Existence of regenerating storage codes on the cutset 
bound 

We end this section with a brief overview of the known 
constructions and nonexistence results to date. As mentioned 
before, regenerating codes have been shown to exist for all 
parameter sets on the cutset bound ([TJ. 

For the MBR point (minimizing repair bandwidth), linear 
exact-repair regenerating storage codes have been constructed 
for all parameter sets in IIT2I using a product-matrix construc- 
tion, with a field size of the order of the number n of nodes. 
Exact-repair-by-transfer regenerating MBR codes have been 
constructed for the case r = n — 1, now using field sizes of 
order n? ifTsl . 

Exact-repair MSR regenerating storage codes have been 
constructed for all parameter sets with r > 2A: — 2 in lfT2l 
(for some other constructions in this range, see the references 
on the Storage Wiki ||6l); the non-existence of exact-repair 
regenerating MSR codes with r < 2k — 2 for the case 
/? = 1 (commonly referred to as "no symbol extension") was 
demonstrated in ||9l, by showing that a phenomenon called 
interference alignment necessary must occur in such codes. 
To complete the picture, 1(141 and ifTSl have shown asymptotic 
existence of exact-repair regenerating MSR storage codes for 
all n, k, r (that is, for points arbitrarily close to the cutset 
bound, for sufficiently large file sizes). Finally, functional 
repair-by-transfer regenerating MSR codes for parameter sets 
with fc = 2 and r = n — 1 have been constructed in lfT6l . 

The paper ifTSi also shows the non-achievability of essen- 
tially all interior points on the cutset bound (that is, different 
from MBR and MSR) for exact repair in the case /? = 1 (no 
symbol extension). 

IV. Repair locality in storage codes 

Application contexts like cloud storage systems and deep 
archival storage require a low disk I/O overhead ||5]. Since the 
disk I/O is proportional to the number of nodes involved in a 
repair, this makes the repair locality an important performance 
metric, which was recognized in ifTTI . ifTSI . lfT9l . Codes 
designed for small repair locaUty are for example Pyramid 
codes I20J, Homomorphic codes ifTTl and Spread codes II2TI . 



codes in lfT9l . and LRC codes ll22ll . Some of the repair-by- 
transfer codes in ifTTI and ||231 can also be considered as 
designed for this purpose. 

Already in |5|. it was conjectured that there are trade- 
offs between recovery I/O and storage efficiency. Up to now, 
bounds have been developed in the case of exact repair, 
involving rate, repair locality, and code distance. For linear 
[n, k, (i]-codes, it was shown in |24| that n — k > \k/r~\ +d—2 
(attainable for d > 2), implying that the rate R ~ k/n satisfies 
R < r/{r + 1). A more general information-theoretical bound 
derived in IZSl (see also its full version ll26l ). states that 
d <n — \m/a \ — \m/{ra)~\ + 2, where m is the amount of 
encoded information, d the "information-theoretical distance" 
of the code (defined as the maximum number such that any 
k — n ~ d + 1 nodes can reconstruct the stored information) 
and a the storage per node; in the case where {r + l)\n, a code 
was constructed with d = n— \m/a \ — \m/ {ra)] — 1. Again, 
if any failed node can be repaired at all then d > 2, in which 
case the bound implies that the rate i? = to/ {no) < r/(r + 1). 

In all the models discussed above, a given node and all 
its reincarnations are assumed to have the same, fixed repair 
set of size r. Our aim is to investigate the trade-off between 
rate and repair locality in an information flow network setting 
similar to that of the cutset bound ([T]i. Remark that the cutset 
bound does not depend on the requirement that every set of k 
nodes can recover the stored information: indeed, inspection 
of the proof in Q shows that the cutset bound still holds if 
we only assume that some set of k nodes has this property, 
as long as a newcomer node can connect to any set of r live 
nodes during repair Already in |j4], the question was raised if 
the mincut value could be larger if a newcomer could choose 
the r live nodes to connect to. It is precisely this question that 
we investigate here. 

So assume that we have a storage code for the functional 
repair regime that can store a total amount m of information 
by storing an amount a of data onto n nodes, with the further 
property that at all times, a failed node can be (functionally) 
repaired by downloading from each member of some set of 
r nodes an amount /3 of data, so that at any time during 
this ongoing process the original information can be fully 
retrieved. Then what can be said about the maximum coding 
rate R = m/{na)l The question can be formulated in terms 
of a game played by two players, KILLER and BUILDER, 
on the information flow graph as in l|4|. Originally, the graph 
consists of n isolated live nodes. The two players move in 
turn; KILLER moves by choosing a node and killing it, then 
BUILDER moves by creating a new live node and connecting 
to it from some set of r live nodes of his choice. The aim of 
KILLER is to force a cutset of small capacity, and BUILDER 
tries to prevent that. Remark that the maximum amount of 
information that can be maintained in the storage system is 
at most equal to the capacity of any cutset at any stage of 
the game. The result of the game under optimal play by both 
players thus provides an upper bound on to. In IZTll . we use 
this game to prove the following results. 

Theorem 4.1: With the above notation and assumptions, we 



have the following. 

1) If a = /3, then R < r/{r + l). Equality holds for (exact- 
repair) MDS codes with n — r + 1. 

2) If a = r/3, then R < 1/2. Equality holds for the exact- 
repair-by-transfer linear storage codes in Example 13. 2J 

Theorem 4.2: With the same notation and assumptions, for 
r = 2 we have that 

«<^- 

6a 

More precisely, if n = iq — e with e G {0, 1, 2}, then 

m < qa + {q — e)j3. 

Note that the examples mentioned in Theorem 14.11 allow the 
construction of codes of length n = ?■ + 1 attaining the 
bounds in all cases mentioned in the above theorems, as well 
as construction of optimal (repetition) codes of lengths n 
whenever r + l|n. 

Recently 1281 . ||29| . 1301 . generalizations of the cutset bound 
from im have been derived in an information flow network 
setting similar to the one in Section HIl now for the case where 
a number s of nodes is repaired simultaneously. Here, during 
repair each of the s newcomer nodes is allowed to download 
an amount pi of data from a set of live nodes of size r, and 
subsequently an amount P2 of data from each of the other 
newcomer nodes. It would be interesting to generalize our 
bounds to this more general setting. 

V. Conclusion 

We have investigated the trade-off between the coding rate 
R and repair locality r in the functional repair regime, in a 
information flow network setting. Tight bounds are presented 
for the two extreme cases a ~ (5, where R < r/{r+ 1), and 
a = rp, where i? < 1/2, and for the case where r = 2. 
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